Goto

Collaborating Authors

 main topic


A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data

Alqazlan, Lama, Fang, Zheng, Castelle, Michael, Procter, Rob

arXiv.org Artificial Intelligence

The availability of big data has significantly influenced the possibilities and methodological choices for conducting large-scale behavioural and social science research. In the context of qualitative data analysis, a major challenge is that conventional methods require intensive manual labour and are often impractical to apply to large datasets. One effective way to address this issue is by integrating emerging computational methods to overcome scalability limitations. However, a critical concern for researchers is the trustworthiness of results when Machine Learning (ML) and Natural Language Processing (NLP) tools are used to analyse such data. We argue that confidence in the credibility and robustness of results depends on adopting a 'human-in-the-loop' methodology that is able to provide researchers with control over the analytical process, while retaining the benefits of using ML and NLP. With this in mind, we propose a novel methodological framework for Computational Grounded Theory (CGT) that supports the analysis of large qualitative datasets, while maintaining the rigour of established Grounded Theory (GT) methodologies. To illustrate the framework's value, we present the results of testing it on a dataset collected from Reddit in a study aimed at understanding tutors' experiences in the gig economy.


Using Zero-shot Prompting in the Automatic Creation and Expansion of Topic Taxonomies for Tagging Retail Banking Transactions

Moraes, Daniel de S., Santos, Pedro T. C., da Costa, Polyana B., Pinto, Matheus A. S., Pinto, Ivan de J. P., da Veiga, Álvaro M. G., Colcher, Sergio, Busson, Antonio J. G., Rocha, Rafael H., Gaio, Rennan, Miceli, Rafael, Tourinho, Gabriela, Rabaioli, Marcos, Santos, Leandro, Marques, Fellipe, Favaro, David

arXiv.org Artificial Intelligence

This work presents an unsupervised method for automatically constructing and expanding topic taxonomies by using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot prompting to find out where to add new nodes, which, to our knowledge, is the first work to present such an approach to taxonomy tasks. We use the resulting taxonomies to assign tags that characterize merchants from a retail bank dataset. To evaluate our work, we asked 12 volunteers to answer a two-part form in which we first assessed the quality of the taxonomies created and then the tags assigned to merchants based on that taxonomy. The evaluation revealed a coherence rate exceeding 90% for the chosen taxonomies, while the average coherence for merchant tagging surpassed 80%.


"Mama Always Had a Way of Explaining Things So I Could Understand'': A Dialogue Corpus for Learning to Construct Explanations

Wachsmuth, Henning, Alshomary, Milad

arXiv.org Artificial Intelligence

As AI is more and more pervasive in everyday life, humans have an increasing demand to understand its behavior and decisions. Most research on explainable AI builds on the premise that there is one ideal explanation to be found. In fact, however, everyday explanations are co-constructed in a dialogue between the person explaining (the explainer) and the specific person being explained to (the explainee). In this paper, we introduce a first corpus of dialogical explanations to enable NLP research on how humans explain as well as on how AI can learn to imitate this process. The corpus consists of 65 transcribed English dialogues from the Wired video series \emph{5 Levels}, explaining 13 topics to five explainees of different proficiency. All 1550 dialogue turns have been manually labeled by five independent professionals for the topic discussed as well as for the dialogue act and the explanation move performed. We analyze linguistic patterns of explainers and explainees, and we explore differences across proficiency levels. BERT-based baseline results indicate that sequence information helps predicting topics, acts, and moves effectively


Predicting IMDb Rating of TV Series with Deep Learning: The Case of Arrow

Gomes, Anna Luiza, Vianna, Getúlio, Escovedo, Tatiana, Kalinowski, Marcos

arXiv.org Artificial Intelligence

Context: The number of TV series offered nowadays is very high. Due to its large amount, many series are canceled due to a lack of originality that generates a low audience. Problem: Having a decision support system that can show why some shows are a huge success or not would facilitate the choices of renewing or starting a show. Solution: We studied the case of the series Arrow broadcasted by CW Network and used descriptive and predictive modeling techniques to predict the IMDb rating. We assumed that the theme of the episode would affect its evaluation by users, so the dataset is composed only by the director of the episode, the number of reviews that episode got, the percentual of each theme extracted by the Latent Dirichlet Allocation (LDA) model of an episode, the number of viewers from Wikipedia and the rating from IMDb. The LDA model is a generative probabilistic model of a collection of documents made up of words. Method: In this prescriptive research, the case study method was used, and its results were analyzed using a quantitative approach. Summary of Results: With the features of each episode, the model that performed the best to predict the rating was Catboost due to a similar mean squared error of the KNN model but a better standard deviation during the test phase. It was possible to predict IMDb ratings with an acceptable root mean squared error of 0.55.


Optical Character Recognition (OCR) in Python

#artificialintelligence

Optical Character Recognition (OCR) with less than 10 Lines of Code using Python · Want to read more stories like this? It costs only 4,16$ per month. Within the area of Computer Vision is the sub-area of Optical Character Recognition (OCR), which aims to transform images into texts. OCR can be described as converting images containing typed, handwritten or printed text into characters that a machine can understand. It is possible to convert scanned or photographed documents into texts that can be edited in any tool, such as the Microsoft Word.


Topics of interest – Humanities Seminar 2021

#artificialintelligence

Conference Table 1 – Democracy and artificial intelligence: In the last decade, politically stable countries with a long-held and firm commitment to freedom of expression have witnessed their public debates plunge into an abyss. On social network platforms, anonymous and insidious mass manipulation techniques have corrupted public opinion. Democracy has spawned its opposite, electing leaders who pit themselves against the Democratic Rule of Law. At the center of the global revolution, shady strategies implemented by ultra-complex algorithms have learned how to exploit the fears and desires of crowds and individuals to engender fanaticism and irrationality. Politics has lost connection with the knowledge of factual truth, while obscurantist leaders strive to undermine fundamental rights, science, and human dignity.


Semi-Supervised Learning Approach to Discover Enterprise User Insights from Feedback and Support

Deng, Xin, Smith, Ross, Quintin, Genevieve

arXiv.org Machine Learning

With the evolution of the cloud and customer centric culture, we inherently accumulate huge repositories of textual reviews, feedback, and support data.This has driven enterprises to seek and research engagement patterns, user network analysis, topic detections, etc.However, huge manual work is still necessary to mine data to be able to mine actionable outcomes. In this paper, we proposed and developed an innovative Semi-Supervised Learning approach by utilizing Deep Learning and Topic Modeling to have a better understanding of the user voice.This approach combines a BERT-based multiclassification algorithm through supervised learning combined with a novel Probabilistic and Semantic Hybrid Topic Inference (PSHTI) Model through unsupervised learning, aiming at automating the process of better identifying the main topics or areas as well as the sub-topics from the textual feedback and support.There are three major break-through: 1. As the advancement of deep learning technology, there have been tremendous innovations in the NLP field, yet the traditional topic modeling as one of the NLP applications lag behind the tide of deep learning. In the methodology and technical perspective, we adopt transfer learning to fine-tune a BERT-based multiclassification system to categorize the main topics and then utilize the novel PSHTI model to infer the sub-topics under the predicted main topics. 2. The traditional unsupervised learning-based topic models or clustering methods suffer from the difficulty of automatically generating a meaningful topic label, but our system enables mapping the top words to the self-help issues by utilizing domain knowledge about the product through web-crawling. 3. This work provides a prominent showcase by leveraging the state-of-the-art methodology in the real production to help shed light to discover user insights and drive business investment priorities.


AiNews.page: AI & ML & AR News & More - Apps on Google Play

#artificialintelligence

Ai News page, brings together the best news on AI topics from reliable sources and platforms. We've curated and vetted the content, so you don't have to worry about its veracity and usefulness. We understand that your time is limited. That's why we only include those articles that will benefit you the most. For ease of use, we've divided Ai News page into three distinct sections -- News, Social Media Engagement and Chat.


AiNews.page: AI & ML & AR News

#artificialintelligence

AiNews.page, brings together the best news on AI topics from reliable sources and platforms. We've curated and vetted the content, so you don't have to worry about its veracity and usefulness. We understand that your time is limited. That's why we only include those articles that will benefit you the most. Below is a brief description on what you can expect from each section.


Google's John Mueller on Ranking for Featured Snippets - Search Engine Journal

#artificialintelligence

Someone asked John Mueller in a Webmaster Hangout about Schema structured data and ranking for featured snippets. Structured Data is useful for communicating a deep amount of precise data. John Mueller answered the question by describing what it takes to make it easier for Google to use your page for featured snippets. The question asked was about the use of structured data for ranking in featured snippets. It was also about showing up for voice search via the Google Assistant.